16 research outputs found

    Causal Inference by Stochastic Complexity

    Full text link
    The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes

    Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

    Full text link
    We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.Comment: Accepted at AIES'2

    Causal Inference on Event Sequences

    Get PDF
    Given two discrete valued time series—that is, event sequences—of length n can we tell whether they are causally related? That is, can we tell whether x^n causes y^n, whether y^n causes x^n? Can we do so without having to make assumptions on the distribution of these time series, or about the lag of the causal effect? And, importantly for practical application, can we do so accurately and efficiently? These are exactly the questions we answer in this paper. We propose a causal inference framework for event sequences based on information theory. We build upon the well-known notion of Granger causality, and define causality in terms of compression. We infer that x^n is likely a cause of y^n if y^n can be (much) better sequentially compressed given the past of both y^n and x^n, than for the other way around. To compress the data we use the notion of sequential normalized maximal likelihood, which means we use minimax optimal codes with respect to a parametric family of distributions. To show this works in practice, we propose CUTE, a linear time method for inferring the causal direction between two event sequences. Empirical evaluation shows that CUTE works well in practice, is much more robust than transfer entropy, and ably reconstructs the ground truth on river flow and spike train data

    MDL for Causal Inference on Discrete Data

    No full text

    Correlation by Compression

    No full text

    Origo: Causal Inference by Compression

    Get PDF
    Causal inference from observational data is one of the most fundamental problems in science. In general, the task is to tell whether it is more likely that X caused Y, or vice versa, given only data over their joint distribution. In this paper we propose a general inference framework based on Kolmogorov complexity, as well as a practical and computable instantiation based on the Minimum Description Length (MDL) principle. Simply put, we propose causal inference by compression. That is, we infer that X is a likely cause of Y if we can better compress the data by first encoding X, and then encoding Y given X, than in the other direction. To show this works in practice, we propose Origo, an efficient method for inferring the causal direction from binary data. Origo employs the lossless Pack compressor (Tatti & Vreeken, 2008) and searches for that set of decision trees that encodes the data most succinctly. Importantly, it works directly on the data and does not require assumptions about neither distributions nor the type of causal relations. To evaluate Origo in practice, we provide extensive experiments on synthetic, benchmark, and real-world data, including three case studies. Altogether the experiments show that Origo reliably infers the correct causal direction on a wide range of settings

    Accurate Causal Inference on Discrete Data

    Get PDF
    Additive Noise Models (ANMs) provide a theoretically sound approach to inferring the most likely causal direction between pairs of random variables given only a sample from their joint distribution. The key assumption is that the effect is a function of the cause, with additive noise that is independent of the cause. In many cases ANMs are identifiable. Their performance, however, hinges on the chosen dependence measure, the assumption we make on the true distribution. In this paper we propose to use Shannon entropy to measure the dependence within an ANM, which gives us a general approach by which we do not have to assume a true distribution, nor have to perform explicit significance tests during optimization. The information-theoretic formulation gives us a general, efficient, identifiable, and, as the experiments show, highly accurate method for causal inference on pairs of discrete variables-achieving (near) 100% accuracy on both synthetic and real data
    corecore